Project Group - 8¶
Members & Student numbers:¶
Lars Zevenbergen, 4892739
Yung Ting, Lu 6221149
YuJung, Lin, 6183859
Stephen Huang, 4956311
Huib van der Veen, 6330584
Introduction¶
The COVID-19 pandemic brought about significant disruptions to global supply chains, affecting various sectors and modes of transportation. One area that experienced profound impacts was the movement of goods via air and sea. These essential channels for international trade faced distinct challenges and exhibited different patterns of disruption and recovery. The crisis led to a sudden drop in both the volume and value of freight, with air transport experiencing sharp declines due to grounding of passenger flights, while maritime shipping encountered delays and port congestions.
This research aims to examine how the pandemic affected the volume and value of goods transported by air and sea, focusing on the divergence in recovery timelines and the resilience of these transport modes. By analyzing data from before, during, and after the pandemic, the study seeks to identify key factors that influenced the speed and robustness of recovery in both air and maritime freight. Particular attention will be given to variations in the impact on different types of cargo and the adaptive responses adopted by the logistics industry.
Through this investigation, the project will provide insights into the vulnerabilities and strengths of global freight transportation systems, offering recommendations to enhance their resilience to future disruptions. The analysis will not only contribute to a better understanding of how the logistics sector can be fortified against similar crises but also help shape policies aimed at improving the flexibility and sustainability of supply chains.
Research Topic:¶
Resilience of Global Supply Chains: A Focus on Aerial and Maritime Freight Volume.
Research Objective¶
The objective of this research is to quantitatively analyze the impact of the COVID-19 pandemic on the volume and value of goods transported by air and sea. The focus will be on comparing the recovery timelines and resilience of these two transportation modes, identifying factors that contributed to different recovery patterns. This will involve examining trends in air and maritime freight from 2002 to 2024, with particular attention to the effects on various cargo types. The goal is to provide insights that can help improve the resilience and adaptability of global supply chains.
“The objective of this research is to analyze how the pandemic affected the volume and value of goods transported by air and sea, focusing on identifying key factors that influenced the differences in recovery patterns between these transport modes.”
SMART Criteria¶
- Specific: The research will focus on comparing the impact of the COVID-19 pandemic on air and sea freight, analyzing the differences in the volume and value of goods transported. It will assess how these two modes of transportation recovered and identify factors influencing their resilience. The study will also evaluate the effect on different types of cargo, such as perishable goods, high-value items, and bulk commodities.
- Measureable: The analysis will utilize quantitative data on freight volumes and values from Eurostat databases, covering the period from 2013 to 2024. Key metrics will include the percentage change in freight volumes, recovery rates, and differences in value trends between air and sea transport.
- Achievable: The research will be conducted using accessible Eurostat data on air and maritime freight. Data processing and analysis will be carried out with Python, leveraging statistical modeling to forecast hypothetical trends.
- Relevant: The study addresses an important aspect of global supply chain resilience, especially in the context of recent and potential future disruptions. It will contribute valuable insights for policymakers, logistics companies, and other stakeholders aiming to enhance the robustness of international trade.
- Time-bound: We used a dataset on air cargo transport spanning from 2003 to 2024, and freight transport spanning from 2003 to 2023, covering countries such as Belgium, Germany, France, the Netherlands, and Italy.
Main Research Question:¶
- "How did the COVID-19 pandemic influence the volume of goods transported in Europe by air and sea?”
Sub-Research Question 1:
- “How would the overall air freight trends for the five largest trading countries in Europe have developed if the pandemic had not occurred, and to what extent has any initial disruption been offset by subsequent recovery?”
Sub-Research Question 2:
- “Which commodities were most affected by the COVID pandemic, and did certain commodities experience greater changes in trading value?”
Sub-Research Question 3:
- “How did the COVID-19 pandemic affect the overall total import and export volumes by air and sea in five largest trading countries in Europe, considering each mode of transport separately?”
Data¶
The data used in this project are from EuroStat (https://ec.europa.eu/eurostat/databrowser/view/avia_gooc/default/table) and the UN Comtrade Plus Database (https://comtradeplus.un.org/). For each subquestion, the data pipeline will be presented.
Subquestion 1¶
Data Pipeline 1¶
For the subquestion 1, below is a detailed breakdown of the steps involved in our data pipeline:
Data Collection The dataset was obtained from Eurostat. The dataset covers the main europe countries such as Belgium, Germany, France, the Netherlands, and Italy. The dataset was extracted in the form of an Excel file (Freight&Mail_all_countries_2003_quarterly.xlsx) and loaded into the environment using the Python pandas library.
Data Loading The data was read into Python using the pd.read_excel() function to import the Excel spreadsheet. We specified the relevant sheet and skipped unnecessary rows to focus only on the data of interest.
Data Cleaning After loading the data, several cleaning steps were applied: Dropping null values: Rows containing null values were removed to prevent errors during analysis. Renaming columns: To make the dataset more intuitive, columns were renamed according to their corresponding year of freight transport data. Handling inconsistent formats: We ensured that all numerical columns were correctly formatted as floating-point numbers.
Build ARIMA model To analyze and forecast the volume of goods transported, we used an ARIMA (Auto-Regressive Integrated Moving Average) model to evaluate the impact of the COVID-19 pandemic on transport volumes. The dataset, which contains historical air freight volume data for the five largest trading countries in Europe, was first converted into a time series format. This format is essential for analyzing trends and making predictions based on temporal patterns. The time series data was then divided into a training set (pre-pandemic period, up to 2019) and a test set (2020 onward). The training set was used to build and fit the ARIMA model, while the test set helped evaluate the accuracy and robustness of the forecast.
Forecasting and Plotting Using the best-fitting ARIMA model based on pre-pandemic data (up to 2019), we generated a forecast for the test data (2020-2024), representing a “no-pandemic” scenario. This forecast compares with actual data during the pandemic and assesses the extent of disruption and subsequent recovery. The forecasted values and actual values were plotted to visually examine where and to what extent the pandemic impacted freight volumes.
# add all imports, including for SQ2 and 3
import pandas as pd
import numpy as np
import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import init_notebook_mode
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
import warnings
init_notebook_mode(connected=True)
pio.renderers.default = "plotly_mimetype+notebook"
warnings.filterwarnings('ignore')
Data Visualisation¶
First, load the data and plot the data
# Load data
def load_data(file_path):
"""
Loads the data from the provided Excel file path, skipping rows that do not contain actual data.
Parameters:
file_path (str): Path to the Excel file containing the data.
Returns:
A dataframe containing the data from the Excel file.
"""
df = pd.read_excel(file_path, sheet_name='Sheet 1', skiprows=10, index_col=0)
df.dropna(how='any', inplace=True)
return df
# Load Dataset
file_path = r"data/SQ1/Freight&Mail_all_countries_2003_quarterly.xlsx"
df_air = load_data(file_path)
# Create a Plotly figure
fig = go.Figure()
x = df_air.columns
y_labels = df_air.index
# Loop through each country and add a trace for it, excluding the United Kingdom
for country in y_labels:
if country == "United Kingdom":
continue
# Replace ":" with 0 for missing data
y_values = df_air.loc[country].replace(":", 0).values
fig.add_trace(go.Scatter(x=x, y=y_values, mode='lines', name=country))
# Update layout with titles and labels
fig.update_layout(
title="Freight and Mail Air Transport by Country Over the Years",
xaxis_title="Year",
yaxis_title="Freight and Mail (tonnes)",
legend=dict(
title="Country",
x=1.05,
y=1,
font=dict(size=10)
)
)
# Rotate x-axis labels and adjust layout for better readability
fig.update_xaxes(tickangle=45)
fig.update_layout(margin=dict(t=40, r=150, b=40, l=40))
# Show the plot
fig.show()
Germany, France, Belgium, Netherlands and Italy are the five largest trading countries transporting the greatest amount of freigt and mail by air before the pandemic. Use these five countries for the further analysis.
# Load Dataset
file_path = r"data/SQ1/Freight&Mail_five_countries_2003_quarterly.xlsx"
df_air = load_data(file_path)
How would the overall air freight trends for the five largest trading countries in Europe have developed if the pandemic had not occurred, and to what extent has any initial disruption been offset by subsequent recovery?
To answer that, the seasonal ARIMA (Auto-Regressive Integrated Moving Average) model, a widely applied statistical approach for time series forecasting, is used. The ARIMA model is particularly suited for this question because it is highly effective in modeling trends, seasonal patterns, and fluctuations in time series data. Here are the steps to building a seasonal ARIMA model.
Step 1. Preprocess data¶
The dataset is first converted into a time series format for analyzing trends and making predictions based on temporal patterns. The time series data is then divided into a training set (pre-pandemic period, up to 2019) and a test set (2020 onward). The training set is used to build and fit the ARIMA model, while the test set helps evaluate the accuracy and robustness of the forecast.
# Preprocess data
def preprocess_data(df, country = 'Germany'):
"""
Divides the dataframe into two lists. Train holds data from before 2020 Q1,
and test holds data from and after 2020 Q1.
Parameters:
df (pandas.DataFrame): dataframe of the data.
country (str, optional): The country to process. Default is "Germany".
Returns:
A pandas.Series with train data and a list with test data.
"""
time_series = df.T.loc[:, country] # Germany for default
train = time_series[(time_series.index < '2020-Q1')] # training set
# val = time_series[(time_series.index < '2018-Q1')]
test = time_series[time_series.index >= '2020-Q1'] # test
return train, test
Step 2. Stationary Check¶
ARIMA models assume that the time series data is stationary, meaning its mean and variance are constant over time. A stationary time series is more predictable and easier to model. To verify stationarity, we use the Augmented Dickey-Fuller (ADF) test. The ADF test evaluates the null hypothesis (H0) that a unit root is present in the series (indicating non-stationarity). If the p-value is below a significance level (e.g., 0.05), we reject H0 suggesting that the series is stationary.
def check_stationarity(train, threshold = 0.05):
"""
Checks if a time series is stationary by applying the Augmented Dickey-Fuller (ADF) test and,
if necessary, the series is differenced to achieve stationarity. This step is critical for ARIMA
models, which require stationary data to provide reliable forecasts.
Parameters:
train (pandas.Series): The time series data to be tested and, if needed, differenced.
threshold (float, optional): The p-value threshold for stationarity. If the ADF test p-value is above this level,
differencing will be applied to make the series stationary. Default is 0.05.
Returns:
The number of differences (d) needed to achieve stationarity, where 0 indicates
that the series is already stationary.
"""
# Perform ADF test on the original series
result = adfuller(train.dropna()) # Ensure no NaN values
print("ADF Test on original training data")
print("p-value:", round(result[1], 4))
d = 0
# Check if differencing is needed (if p-value > 0.05, indicating non-stationarity)
while result[1] > threshold:
# Perform ADF test on the differenced series
train = train.diff().dropna()
result = adfuller(train)
print("\nADF Test on differenced training data")
print("p-value:", round(result[1], 4))
d += 1
else:
print(f"The null hypothesis is rejected, which indicates that the training data is stationary and does not require differencing. Return parameter d = {d}")
return d
The d reflects the d-order differencing that made the data stationary and will be used later.
Step 3. Building the seasonal ARIMA Model¶
An seasonal ARIMA model combines three components: Autoregression (AR), Differencing (I), and Moving Average (MA), denoted as ARIMA(𝑝,𝑑,𝑞)(𝑃,𝐷,𝑄,𝑠).
𝑝,𝑑,𝑞: These are the non-seasonal parameters for autoregression (AR), differencing (I), and moving average (MA), respectively.
𝑃,𝐷,𝑄: These are the seasonal counterparts, controlling seasonal autoregression, differencing, and moving average terms.
𝑠: The seasonal period, e.g., 4 for quarterly data.
Various combinations of 𝑝, 𝑞, 𝑃, and 𝑄 are tested, with the model selection based on minimizing the Akaike Information Criterion (AIC). AIC is a traditional indicator that helps identify the best-fitting model. The model with lower AIC has better goodness of fit without being overly complex, which could lead to overfitting. Note that c and 𝐷 are already set in the previous step.
def find_best_combination(train, d, display_models = True):
"""
Identifies the optimal ARIMA or seasonal ARIMA model parameters for a given time series dataset
by iterating over combinations of non-seasonal (p, q) and seasonal (P, Q) parameters. The
best model is selected based on the Akaike Information Criterion (AIC). Lower AIC is better.
Parameters:
train (pandas.Series): The time series data for training the model.
d (int) The degree of differencing required for stationarity, previously determined.
display_models (bool, optional): If True, displays a summary of the top models with the lowest AIC scores. Default is True.
Returns:
The ARIMA or seasonal ARIMA model with the lowest AIC value.
Notes:
Seasonal period is fixed to 4 (for quarterly data).
"""
# Convert the training data to an array
train_array = np.asarray(train, dtype=float)
# Define ranges for non-seasonal p and q
non_seasonal_p_values = range(1, 4) # Adjust as needed
non_seasonal_q_values = range(1, 4) # Adjust as needed
# Define ranges for seasonal p and q
seasonal_p_values = range(1, 4) # Adjust as needed
seasonal_q_values = range(1, 4) # Adjust as needed
# Initialize lists to store results
non_seasonal_results = []
seasonal_results = []
# Loop over combinations of non-seasonal p and q values
print("\nStart looping over combinations\n")
for p in non_seasonal_p_values:
for q in non_seasonal_q_values:
model_nonseasonal = ARIMA(train_array, order=(p, d, q))
result_nonseasonal = model_nonseasonal.fit()
non_seasonal_results.append((p, q, result_nonseasonal.aic, result_nonseasonal))
# Loop over combinations of seasonal p and q values
for sp in seasonal_p_values:
for sq in seasonal_q_values:
model_seasonal = ARIMA(train_array, order=(p, d, q), seasonal_order=(sp, d, sq, 4))
result_seasonal = model_seasonal.fit()
seasonal_results.append((p, q, result_seasonal.aic, sp, sq, result_seasonal))
print("Loop over combinations done\n")
# Combine results and find the model with the smallest AIC
all_results = non_seasonal_results + seasonal_results
best_model_info = min(all_results, key=lambda x: x[2]) # Get the model with the smallest AIC
best_model_aic = best_model_info[2]
best_model = best_model_info[-1]
# Determine whether the best model is seasonal or non-seasonal
if len(best_model_info) == 4:
model_type = "Non-seasonal model"
else:
model_type = "Seasonal model"
print(f"The best model is {model_type} p={best_model_info[0]}, q={best_model_info[1]}, P={best_model_info[3]}, Q={best_model_info[4]} and AIC={best_model_aic}\n")
if display_models == True:
# Display the training results
non_seasonal_df = pd.DataFrame(non_seasonal_results, columns=['p', 'q', 'AIC', 'Model']).sort_values(by='AIC')
seasonal_df = pd.DataFrame(seasonal_results, columns=['p', 'q', 'AIC', 'P', 'Q', 'Model']).sort_values(by='AIC')
# Display the training parameters
print("Non-seasonal models:")
print(non_seasonal_df.loc[:, ['p', 'q', 'AIC']].head(3))
print("\nSeasonal model:")
print(seasonal_df.loc[:, ['p', 'q', 'AIC', 'P', 'Q']].head(3))
else:
pass
return best_model
Step 4. Forecast and Plot¶
Using the best-fitting seasonal ARIMA model based on pre-pandemic data (up to 2019), we generate a forecast for test data (during 2020-2024), representing a “no-pandemic” scenario.
Step 5. Recovery time¶
The recovery time is defined as the time when the actual data (green line) aligns with or have the closet gap with the forecasted values (blue line), which indicates that transport volumes have returned to expected levels if the pandemic had not occurred.
def forecast_plot(best_model, train, test):
"""
Generates a forecast for the test period using the best-fitting seasonal ARIMA model and plots
the training data, actual test data, and forecasted values. This visualization aims to model a
"no-pandemic" scenario by forecasting based on pre-pandemic data and assessing recovery time by
identifying when actual volumes return close to expected levels.
Parameters:
best_model (statsmodels.tsa.arima.model.ARIMAResults): The optimized ARIMA model fitted on the training data.
train (pandas.Series): Historical time series data used to train the model, ending in 2019.
test (pandas.Series): Time series data for the test period (2020-2024) to compare against the forecast.
Returns:
None
Displays an interactive plot of the training data, forecast, and test data. Also, prints the
recovery time, defined as the time point where the actual data is closest to the forecasted
values, suggesting alignment with pre-pandemic trends.
Notes:
The function computes the absolute difference between the forecast and actual values for each
point in the test period to determine the "recovery time"—the point when the volume aligns most
closely with the forecast, indicating a return to expected transport levels.
"""
train_array = np.asarray(train, dtype=float)
test_array = np.asarray(test, dtype=float)
# Forecast future values using the best model
test_array = np.insert(test_array, 0, train_array[-1])
forecast_index = np.insert(test.index.values, 0, train.index[-1]) # Set forecast index to align with test data
forecast_steps = len(test) # Number of steps to forecast, matching the test set length
forecast = best_model.get_forecast(steps=forecast_steps)
forecast_mean = forecast.predicted_mean # Forecasted mean values
forecast_mean = np.insert(forecast_mean, 0, train_array[-1])
# forecast_ci = pd.DataFrame(forecast.conf_int(), index=forecast_index[0:-1], columns=['lower', 'upper']) # Confidence intervals
# Calculate yearly difference and cumulative sum
absolute_difference = np.abs(test_array[2:] - forecast_mean[2:]) # Calculate yearly difference excluding first aligned value
# Find the index of the smallest difference
recovery_index = np.argmin(absolute_difference)
# Get the corresponding date for the closest recovery time
recovery_time = forecast_index[recovery_index + 2] # +1 to account for the shifted index
# Display the closest recovery time and its corresponding difference
print("\nClosest Recovery Time:", recovery_time)
print(f"Delay at Closest Recovery Time:{round(absolute_difference[recovery_index], 3)} (tonnes)\n")
# Plot
# Create the figure
fig = go.Figure()
# Add the training data line
fig.add_trace(go.Scatter(
x=train.index,
y=train_array,
mode='lines',
name='Training Data',
line=dict(color='royalblue')
))
# Add the actual data line
fig.add_trace(go.Scatter(
x=forecast_index,
y=test_array,
mode='lines',
name='Actual Data',
line=dict(color='orangered')
))
# Add the forecast line
fig.add_trace(go.Scatter(
x=forecast_index,
y=forecast_mean,
mode='lines',
name='Forecast',
line=dict(color='seagreen')
))
# Add the confidence interval as a shaded area
# fig.add_trace(go.Scatter(
# x=forecast_index[0:-1].tolist() + forecast_index[0:-1][::-1].tolist(),
# y=forecast_ci['upper'].tolist() + forecast_ci['lower'][::-1].tolist(),
# fill='toself',
# fillcolor='rgba(0, 0, 255, 0.2)',
# line=dict(color='rgba(255, 255, 255, 0)'),
# hoverinfo="skip",
# showlegend=True,
# name='Confidence Interval'
# ))
# Update layout
fig.update_layout(
title=f'Best ARIMA Model Forecast of Air Transport Volume in {train.name}',
xaxis_title='Time',
yaxis_title='Volume of Goods Transported (tonnes)',
legend=dict(x=0.01, y=0.99),
xaxis=dict(tickangle=45)
)
# Show the figure
fig.show()
Let's start from Germany.
Germany¶
train, test = preprocess_data(df_air, country = "Germany")
d = check_stationarity(train)
best_model = find_best_combination(train, d, display_models = True)
forecast_plot(best_model, train, test)
ADF Test on original training data
p-value: 0.3957
ADF Test on differenced training data
p-value: 0.0023
The null hypothesis is rejected, which indicates that the training data is stationary and does not require differencing. Return parameter d = 1
Start looping over combinations
Loop over combinations done
The best model is Seasonal model p=2, q=2, P=1, Q=1 and AIC=1518.8166590460717
Non-seasonal models:
p q AIC
0 1 1 1655.715411
3 2 1 1659.054358
6 3 1 1659.671857
Seasonal model:
p q AIC P Q
36 2 2 1518.816659 1 1
0 1 1 1519.620156 1 1
4 1 1 1519.807188 2 2
Closest Recovery Time: 2020-Q3
Delay at Closest Recovery Time:8927.282 (tonnes)
Netherlands¶
train, test = preprocess_data(df_air, country = "Netherlands")
d = check_stationarity(train)
best_model = find_best_combination(train, d, display_models = False)
forecast_plot(best_model, train, test)
ADF Test on original training data p-value: 0.5825 ADF Test on differenced training data p-value: 0.0027 The null hypothesis is rejected, which indicates that the training data is stationary and does not require differencing. Return parameter d = 1 Start looping over combinations Loop over combinations done The best model is Seasonal model p=1, q=1, P=2, Q=2 and AIC=1431.978071528646 Closest Recovery Time: 2020-Q3 Delay at Closest Recovery Time:4105.561 (tonnes)
France¶
train, test = preprocess_data(df_air, country = "France")
d = check_stationarity(train)
best_model = find_best_combination(train, d, display_models = False)
forecast_plot(best_model, train, test)
ADF Test on original training data p-value: 0.706 ADF Test on differenced training data p-value: 0.0038 The null hypothesis is rejected, which indicates that the training data is stationary and does not require differencing. Return parameter d = 1 Start looping over combinations Loop over combinations done The best model is Seasonal model p=1, q=1, P=1, Q=1 and AIC=1508.8928874466699 Closest Recovery Time: 2022-Q3 Delay at Closest Recovery Time:567.963 (tonnes)
Belgium¶
train, test = preprocess_data(df_air, country = "Belgium")
d = check_stationarity(train)
best_model = find_best_combination(train, d, display_models = False)
forecast_plot(best_model, train, test)
ADF Test on original training data p-value: 0.6773 ADF Test on differenced training data p-value: 0.0186 The null hypothesis is rejected, which indicates that the training data is stationary and does not require differencing. Return parameter d = 1 Start looping over combinations Loop over combinations done The best model is Seasonal model p=2, q=3, P=2, Q=1 and AIC=1459.4987097091096 Closest Recovery Time: 2022-Q1 Delay at Closest Recovery Time:15160.374 (tonnes)
Italy¶
train, test = preprocess_data(df_air, country = "Italy")
d = check_stationarity(train)
best_model = find_best_combination(train, d, display_models = False)
forecast_plot(best_model, train, test)
ADF Test on original training data p-value: 0.959 ADF Test on differenced training data p-value: 0.2826 ADF Test on differenced training data p-value: 0.0 The null hypothesis is rejected, which indicates that the training data is stationary and does not require differencing. Return parameter d = 2 Start looping over combinations Loop over combinations done The best model is Seasonal model p=2, q=1, P=1, Q=1 and AIC=1272.1595460783876 Closest Recovery Time: 2021-Q1 Delay at Closest Recovery Time:73414.697 (tonnes)
Subquestion 2¶
Data Pipeline 2¶
For the subquestion 2, the primary dataset was obtained from the UN Comtrade Plus Database, focusing on trade data for specific commodity groups from 2019 to 2023. This period covers pre-pandemic, during-pandemic, and post-pandemic phases, enabling us to study pandemic-related impacts on logistics. Below is a detailed breakdown of the steps involved in our data pipeline:
Data Collection
- Source: UN Comtrade Plus Database (comtradeplus.un.org)
- Period: 2019 - 2023
- Countries: Belgium, Germany, France, the Netherlands, and Italy
- Commodity Codes:
- HS 30: Pharmaceutical Products
- HS 85: Electrical Machiney and Equipment
- HS 90: Optical, Medical, and Surgical Instruments
- HS 27: Mineral Fuels and Oils
- HS 07 & HS 08 & HS 02 : Edible Vegetables, Fruits, Nuts and Meat
- Justification for Selection: These HS codes represent sectors with notable fluctuations due to the pandemic, giving insight into logistics adaptations:
- Pharmaceuticals and Medical Devices reflect emergency supply needs.
- Electrical Machinery reveals demand spikes and supply chain bottlenecks.
- Perishables (Vegetables and Fruits) demonstrate challenges in time-sensitive logistics.
- Mineral Fuels capture demand shock responses in maritime freight.
Data Loading: The data files for each commodity code were loaded into the environment using the load_and_display_data function, which allows us to specify COVID-19-related data and focus on total trade value.
load_and_display_data(df_path, "COVID-19", "Total Oil and Gas Trade Data Across All Countries (Import and Export)", "Total Trade Value ($ Dollar)")withdf_path = r'data/SQ2/Sorted_TradeData-27.xlsx'load_and_display_data(df_path, "COVID-19", "Total Electrical Machinery and Equipment Trade Data Across All Countries (Import and Export)", "Trade Value ($ Dollar)")withdf_path = r'data/SQ2/Sorted_TradeData_85.xlsx'load_and_display_data(df_path, "COVID-19", "Total Perishables Trade Data Across All Countries (Import and Export)", "Total Trade Value ($ Dollar)")withdf_path = r'data/SQ2/TradeData020708.xlsx'load_and_display_data(df_path, "COVID-19", "Total Pharmaceutical Products Trade Data Across All Countries (Import and Export)", "Trade Value ($ Dollar)")withdf_path = r'data/SQ2/Sorted_TradeData_30.xlsx'
Data Cleaning After loading the data, several cleaning steps were applied to ensure consistency and accuracy: Dropping Null Values: Rows containing null values were removed to prevent any bias in the analysis. Renaming Columns: Columns were renamed to ensure the dataset is clear and intuitive. Handling Inconsistent Formats: All numeric columns were standardized to floating-point numbers to enable accurate aggregation and analysis.
Data Aggregation Once cleaned, the dataset was aggregated by HS Code to calculate the total trade value for each commodity group, providing a unified metric to analyze and compare across countries and years.
import pandas as pd
import plotly.graph_objects as go
def load_and_display_data(df_path, annotation_text, title_template_total, y_axis_title):
"""
Loads the data from the provided Excel file path, displays the first few rows,
and visualizes the data for import and export volumes over time per country using Plotly.
Additionally, calculates and plots the total import and export values across all countries.
Parameters:
df_path (str): Path to the Excel file containing the trade data.
title_template_total (str): Template for the plot title, including placeholders for {country}.
annotation_text (str): Text for annotation (e.g., "COVID-19").
y_axis_title (str): Label for the y-axis.
Returns:
None
"""
# Load the data from the Excel file
trade_data = pd.read_excel(df_path)
# Pivot the data to organize it for plotting
pivot_data = trade_data.pivot_table(values='primaryValue',
index=['period', 'reporterDesc'],
columns='flowDesc',
aggfunc='sum').reset_index()
# Calculating total import and export values across all countries for each year
total_trade = pivot_data.groupby('period')[['Import', 'Export']].sum().reset_index()
fig = go.Figure()
fig.add_trace(go.Scatter(x=total_trade['period'], y=total_trade['Import'],
mode='lines+markers', name='Total Import'))
fig.add_trace(go.Scatter(x=total_trade['period'], y=total_trade['Export'],
mode='lines+markers', name='Total Export'))
fig.update_layout(
title=title_template_total,
xaxis_title="Year",
yaxis_title=y_axis_title,
legend_title="Trade Flow",
xaxis=dict(tickmode='linear'),
shapes=[dict(
type='line', x0=2019, x1=2019, y0=0,
y1=max(total_trade['Import'].max(), total_trade['Export'].max()),
line=dict(color='red', dash='dash')
)]
)
fig.add_annotation(x=2019, y=max(total_trade['Import'].max(), total_trade['Export'].max()),
text=annotation_text, showarrow=True, arrowhead=1, ax=-40, ay=-30)
fig.show()
df_path = r'data/SQ2/Sorted_TradeData-27.xlsx'
load_and_display_data(df_path, "COVID-19", "Total Oil and Gas Trade Data Across All Countries (Import and Export)" ,"Total Trade Value ($ Dollar)")
- Oil and Gas:
- Trend Overview: The total trade value of oil and gas experienced fluctuations before the pandemic, with a gradual decline from 2013 to 2017, stabilization through 2018, and a mild rise in 2019.
- Impact of COVID-19: The pandemic severely affected oil and gas trade in 2020, as indicated by a sharp decline in both import and export values, reflecting reduced global demand and transportation disruptions.
- Post-Pandemic Recovery: Recovery was significant from 2021 to 2022, marked by a strong rebound in both imports and exports. However, a noticeable drop in 2023 indicates volatility and economic or geopolitical challenges impacting trade stability.
df_path = r'data/SQ2/Sorted_TradeData_30.xlsx'
load_and_display_data(df_path, "COVID-19", "Total Pharmaceutical Products Trade Data Across All Countries (Import and Export)", "Trade Value ($ Dollar)")
- Pharmaceutical Products:
- Trend Overview: The pharmaceutical sector showed a steady upward trend in trade values from 2013 to 2019.
- Impact of COVID-19: In contrast to other sectors, pharmaceutical exports surged during the pandemic (2020-2021), reflecting increased global demand for medical supplies and drugs. Imports also saw a moderate increase during this period.
- Post-Pandemic Recovery: The growth trend continued post-2020, with exports stabilizing at a higher level compared to the pre-pandemic period. The pharmaceutical trade sector's significant and sustained increase underlines its critical role during and after the crisis.
df_path = r'data/SQ2/Sorted_TradeData_85.xlsx'
load_and_display_data(df_path, "COVID-19", "Total Electrical Machinery and Equipment Trade Data Across All Countries (Import and Export)", "Trade Value ($ Dollar)")
- Electrical Machinery and Equipment:
- Trend Overview: The trade value for both imports and exports of electrical machinery and equipment demonstrated steady growth from 2013 to 2019. However, a notable dip in 2020 aligns with the onset of the COVID-19 pandemic.
- Impact of COVID-19: The pandemic caused a disruption in trade, visible as a decline in both import and export values in 2020. This was followed by a recovery trend from 2021 onwards, with imports rebounding more significantly than exports.
- Post-Pandemic Recovery: By 2023, import values surpassed pre-pandemic levels, indicating resilience and a robust recovery. Exports, although recovered, showed a slower growth rate compared to imports, suggesting potential supply chain or market challenges.
df_path = r'data/SQ2/TradeData020708.xlsx'
load_and_display_data(df_path, "COVID-19", "Total Perishables Trade Data Across All Countries (Import and Export)", "Total Trade Value ($ Dollar)")
- Perishables (Vegetables, Fruits, and Nuts):
- Trend Overview: The trade value of perishables showed stable growth between 2013 and 2019, with minor fluctuations.
- Impact of COVID-19: Unlike other commodities, perishables maintained relatively consistent trade values during 2020, suggesting their essential nature and sustained demand even during the pandemic.
- Post-Pandemic Recovery: Trade values for perishables demonstrated slight growth post-2020, indicating resilience in this sector. The steady import and export values imply that the logistics challenges posed by the pandemic were managed effectively, possibly due to the prioritization of essential goods.
Subquestion 3¶
Data Pipeline 3¶
For the subquestion 3, the data that needs to be obtained represents the five countries import and export tradeflow, which is divided into air and sea transport mode, from 2013 to 2023. We initially planned to analyze five countries, but besides Germany, the data from others are not sufficient. Therefore, we will focus on Germany in this section.
- Data Collection: The data for sub question 3 was acquired from UN Comtrade Plus Database, and we have two different dataset. The first dataset is set to air as the mode of transport, and the second is set to sea. Other data settings are shown below. Trade Flows: exports and imports Reporters: Germany Customs Code: Total customs procedure codes Partners/ 2nd partner: world Breakdown mode: plus Periode: 2013-2023
- Data Loading: The data was read into Python using pd.read_excel() to import the Excel spreadsheet.
- Data Analysis: Take first graph for example:
- Filter both dataframe to select rows where the "Trade Flow" column is equal to "M" (import).
- Remove dollar signs ($) and commas using replace() and convert the values to floats with astype(float) for numerical analysis.
- Add a new column "Transport Mode" to each filtered data frame to label the transport mode.
- Combine the two filtered dataframes.
- Draw the graph by using plotly.
We initially planned to analyze five countries, but due to insufficient data in the database, we will focus on Germany for subquestion 3.
The graph below shows changes in Germany's imports from 2013 to 2023, divided into sea and air freight. A significant drop can be found in sea imports in 2020, and then it has a steady recovery. Germany sea’s impot trade flow had reached 298,277,672,128 in 2022, which is the highest trade value for sea freight in the past decade.
On the other hand, air freight was less affected by the pandemic. It only had a slight decline in 2019 and gradually increased after then. Like sea import, air import freight also reached a highest trade value 162,932,154,436 in 2022.
Lastly, we compare Germany's sea and air import trade values. Over the past decade, Germany's sea import trade value has been about twice as high as air import trade value. The smallest gap was 70,306,572,928 in 2020. Sea imports were heavily impacted by the pandemic, while air imports were less affected. In 2022, because sea imports recovered quickly, the gap had grown much larger, thus reaching a maximum difference of 205,620,203,466.
file_path = "data/SQ3/Germany_Air_Trade_Data.xlsx"
file_path_sea = "data/SQ3/Germany_Sea_Trade_Data.xlsx"
df_germany = pd.read_excel(file_path)
df_germany_sea = pd.read_excel(file_path_sea)
df_germany_m = df_germany[df_germany["Trade Flow"] == "M"].copy()
df_germany_sea_m = df_germany_sea[df_germany_sea["Trade Flow"] == "M"].copy()
df_germany_m["Trade Value (US$)"] = df_germany_m["Trade Value (US$)"].replace('[\\$,]', '', regex=True).astype(float)
df_germany_sea_m["Trade Value (US$)"] = df_germany_sea_m["Trade Value (US$)"].replace('[\\$,]', '', regex=True).astype(float)
df_germany_m["Transport Mode"] = "Air"
df_germany_sea_m["Transport Mode"] = "Sea"
df_combined = pd.concat([df_germany_m, df_germany_sea_m])
fig = px.line(
df_combined,
x="Period",
y="Trade Value (US$)",
color="Transport Mode",
title="Trade Flow (Import) for Germany by Year",
labels={"Period": "Year", "Trade Value (US$)": "Trade Value (US$)"}
)
fig.show()
file_path = "data/SQ3/Germany_Air_Trade_Data.xlsx"
file_path_sea = "data/SQ3/Germany_Sea_Trade_Data.xlsx"
df_germany = pd.read_excel(file_path)
df_germany_sea = pd.read_excel(file_path_sea)
df_germany_x = df_germany[df_germany["Trade Flow"] == "X"].copy()
df_germany_sea_x = df_germany_sea[df_germany_sea["Trade Flow"] == "X"].copy()
df_germany_x["Trade Value (US$)"] = df_germany_x["Trade Value (US$)"].replace('[\\$,]', '', regex=True).astype(float)
df_germany_sea_x["Trade Value (US$)"] = df_germany_sea_x["Trade Value (US$)"].replace('[\\$,]', '', regex=True).astype(float)
df_germany_x["Transport Mode"] = "Air"
df_germany_sea_x["Transport Mode"] = "Sea"
df_combined = pd.concat([df_germany_x, df_germany_sea_x])
fig = px.line(
df_combined,
x="Period",
y="Trade Value (US$)",
color="Transport Mode",
title="Trade Flow (Export) for Germany by Year",
labels={"Period": "Year", "Trade Value (US$)": "Trade Value (US$)"}
)
fig.show()
In conclusion, Germany's sea and air exports both faced declines around 2020 due to the pandemic, while only air import did not experience that. The overall trend shows that sea and air exports followed similar recovery patterns over past ten years, and this keeps the gap between them consistent. However, the overall trends for sea and air imports were different and showed more dramatic changes compared to sea and air exports in Germany.
Conclusion¶
The analysis of the COVID-19 pandemic’s impact on European trade, focusing on air and sea freight across Germany, France, Belgium, the Netherlands, and Italy, shows distinct trends in resilience and recovery within the logistics sector. For air transport, a seasonal ARIMA model was employed to gauge the pandemic’s effects and to model a hypothetical scenario without COVID-19. The results show that recovery timelines varied significantly by country. Germany and the Netherlands showed relatively rapid recoveries, signaling strong adaptive capacities in their logistics operations. In contrast, Belgium and Italy took much longer to return to their expected growth levels, reflecting the uneven effects of the pandemic on European air freight and underscoring the necessity for localized crisis management strategies.
In examining specific sectors, the analysis highlights how the pandemic disrupted trade patterns in oil and gas, pharmaceuticals, electrical machinery, and perishables. The oil and gas sector experienced a pronounced trade drop in 2020, followed by a rebound in 2021-2022, but faced ongoing volatility in 2023. Pharmaceuticals, however, saw an increase in trade during the pandemic, underscoring their essential role and heightened demand for medical supplies. Electrical machinery faced initial declines, yet imports rebounded more robustly than exports, revealing persistent supply chain issues. Perishables maintained consistent trade levels, demonstrating their priority in logistics and essential nature throughout the pandemic. These sectoral insights illustrate varied resilience levels and sector-specific challenges in recovery.
A closer look of Germany’s air and sea freight data from 2013 to 2023 further clarifies these patterns. Germany’s sea imports saw a sharp decline in 2020, followed by a strong recovery, reaching record trade values by 2022. In contrast, air imports showed resilience, with only a slight 2019 dip before achieving their highest values in 2022. Germany’s sea and air exports both faced declines around 2020 but followed similar recovery patterns, resulting in a steady gap between the two. The differences in impact highlight distinct vulnerabilities and adaptive responses in Germany’s import and export sectors across transport modes.
Overall, this analysis highlights the profound and multifaceted impact of the COVID-19 pandemic on Europe’s logistics landscape, specifically in the air and sea transport sectors. The pandemic’s effects on the volume of goods transported across these modes were pronounced yet diverse, influenced by a complex interplay of sector-specific demand, supply chain adaptability, and national-level resilience. The air freight sector, for instance, recovered more quickly in countries like Germany and the Netherlands, where adaptable logistics infrastructures enabled a quicker return to pre-pandemic projections. However, other countries such as Belgium and Italy encountered prolonged disruptions, indicating vulnerabilities in their logistics systems that slowed recovery rates.
The sectoral analysis also revealed differing impacts across industries, highlighting how essential sectors like pharmaceuticals experienced growth amid heightened demand, while industries like oil and gas, more susceptible to global demand fluctuations, faced significant trade volatility. Electrical machinery, too, experienced a challenging recovery, with a notable lag in export growth, underscoring ongoing supply chain pressures. In contrast, the perishables sector managed stable trade levels, demonstrating the robustness of essential goods logistics, even under crisis conditions. This sector-specific resilience suggests that prioritization and supply chain management strategies for essential goods were largely effective across European countries, ensuring continuity of critical supplies despite the broader disruptions.
Germany’s import and export data further illustrated these variations, with sea imports taking a sharp hit in 2020 before surging to record levels by 2022, while air imports maintained a more stable growth trajectory with limited pandemic-related impact. Export trends for both modes mirrored each other, experiencing declines during the peak of the pandemic and rebounding in parallel. This contrast in the effects on imports and exports suggests that Germany’s logistics strategies were more resilient to the pandemic’s pressures on air import flows, while sea imports faced significant initial challenges before recovering.
This analysis shows that while certain areas in Europe were able to adapt quickly, others were significantly impacted by prolonged disruptions, emphasizing that a one-size-fits-all approach is not enough for managing logistics in future crises. Tailored responses that consider both sectoral demands and regional logistical capacities will be crucial in bolstering resilience and ensuring a more balanced recovery in the face of future global challenges. The pandemic has not only highlighted the importance of flexible and adaptive logistics strategies but also illuminated areas in need of reinforcement to minimize the adverse effects of such disruptions on essential and non-essential goods alike.
Discussion¶
In examining the COVID-19 pandemic’s impact on European air and sea freight, we focused on five countries (Germany, France, Belgium, the Netherlands, and Italy) with the highest air freight volumes. This selection aimed to capture trends in key logistics hubs, as these countries’ high transport volumes reflect a broad spectrum of regional trade dynamics. However, limiting the analysis to only five countries means that smaller but potentially resilient markets, such as those in Eastern Europe, were not examined, which could affect the generalizability of the results. Future research could expand this scope to include a more diverse range of countries for a broader perspective on European air freight recovery patterns.
For subquestion two, the study emphasized key commodities—oil and gas, pharmaceuticals, electrical machinery, and perishables. These sectors were selected to capture varying degrees of essentiality and demand shifts during the pandemic, with pharmaceuticals representing essential goods, while oil and gas displayed demand sensitivity to economic shifts. This selection revealed valuable insights into how essential sectors maintained or even grew their trade volumes, as seen with pharmaceuticals, while other industries like oil and gas faced greater volatility. Nevertheless, this categorization may overlook nuances within these broad sectors. For example, pharmaceuticals cover a range of subcategories with different demand patterns. Future analyses could benefit from a more granular breakdown within these categories to better capture such variations.
Subquestion three narrowed the focus to Germany alone, as it represented the highest trading volumes in both sea and air freight among the selected countries. Due to data limitations from other countries, Germany was used as a case study to explore specific impacts and recovery trajectories in depth. While Germany’s significant trade volume provides a representative perspective on recovery, using only one country limits the study’s findings. Differences in logistics infrastructure and government response across Europe could result in unique recovery patterns not captured in Germany’s data alone. Future studies could broaden the analysis to include multiple countries if data becomes available, offering a more comprehensive view of regional responses.
Overall, this study’s findings highlight the varied resilience of European logistics during the pandemic, shaped by country-specific capacities, sectoral demands, and data constraints. By addressing these limitations and expanding the analysis, future research could strengthen the understanding of how to build more resilient logistics frameworks in response to global crises.
Contribution Statement¶
Lars Zevenbergen:
introduction, data pipeline 2 coding subquestion 2, conclusion and discussion
Yung Ting, Lu:
coding for data analysis and visualization, graph description for subquestion 3
YuJung, Lin:
the code and text for Subquestion 1, data pipeline 1, forecast the impact
Stephen Huang:
data search, seasonal forecast, organizing project structure.
Huib van der Veen:
introduction, data pipeline 2, coding subquestion 2 and writing, conclusion and discussion